semi-markov decision process
Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs
We present two elegant solutions for modeling continuous-time dynamics, in a novel model-based reinforcement learning (RL) framework for semi-Markov decision processes (SMDPs), using neural ordinary differential equations (ODEs). Our models accurately characterize continuous-time dynamics and enable us to develop high-performing policies using a small amount of data. We also develop a model-based approach for optimizing time schedules to reduce interaction rates with the environment while maintaining the near-optimal performance, which is not possible for model-free methods. We experimentally demonstrate the efficacy of our methods across various continuous-time domains.
Average-reward reinforcement learning in semi-Markov decision processes via relative value iteration
Yu, Huizhen, Wan, Yi, Sutton, Richard S.
This paper applies the authors' recent results on asynchronous stochastic approximation (SA) in the Borkar-Meyn framework to reinforcement learning in average-reward semi-Markov decision processes (SMDPs). We establish the convergence of an asynchronous SA analogue of Schweitzer's classical relative value iteration algorithm, RVI Q-learning, for finite-space, weakly communicating SMDPs. In particular, we show that the algorithm converges almost surely to a compact, connected subset of solutions to the average-reward optimality equation, with convergence to a unique, sample path-dependent solution under additional stepsize and asynchrony conditions. Moreover, to make full use of the SA framework, we introduce new monotonicity conditions for estimating the optimal reward rate in RVI Q-learning. These conditions substantially expand the previously considered algorithmic framework and are addressed through novel arguments in the stability and convergence analysis of RVI Q-learning.
- North America > Canada > Alberta (0.14)
- North America > United States > New York (0.04)
- Asia > Singapore (0.04)
- Asia > India > NCT > New Delhi (0.04)
Review for NeurIPS paper: Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs
Summary and Contributions: The paper proposes a method for utilizing ODEs to represent dynamics for continuous-time decision-making problems with the aim of They also target filling a perceived gap in the literature of Deep RL for continuous-time problems, where most publications are model-free and discretize time if it is continuous. They claim that their approach leads to lower dependence on vast amounts of training data, better performance and that the model-based approach is well-founded. I tend to agree, although this is not exactly my area. I also believe the importance of connecting ODEs and other explicit models is critical for extending RL methods to important problems in physics, chemistry, epidemiology and population modelling.
Increasing Information for Model Predictive Control with Semi-Markov Decision Processes
Boucher, Rémy Hosseinkhan, Semeraro, Onofrio, Mathelin, Lionel
Recent works in Learning-Based Model Predictive Control of dynamical systems show impressive sample complexity performances using criteria from Information Theory to accelerate the learning procedure. However, the sequential exploration opportunities are limited by the system local state, restraining the amount of information of the observations from the current exploration trajectory. This article resolves this limitation by introducing temporal abstraction through the framework of Semi-Markov Decision Processes. The framework increases the total information of the gathered data for a fixed sampling budget, thus reducing the sample complexity.
- North America > United States > Massachusetts (0.28)
- North America > United States > California (0.28)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs
We present two elegant solutions for modeling continuous-time dynamics, in a novel model-based reinforcement learning (RL) framework for semi-Markov decision processes (SMDPs), using neural ordinary differential equations (ODEs). Our models accurately characterize continuous-time dynamics and enable us to develop high-performing policies using a small amount of data. We also develop a model-based approach for optimizing time schedules to reduce interaction rates with the environment while maintaining the near-optimal performance, which is not possible for model-free methods. We experimentally demonstrate the efficacy of our methods across various continuous-time domains.
Reinforcement Learning in a Physics-Inspired Semi-Markov Environment
Bellinger, Colin, Coles, Rory, Crowley, Mark, Tamblyn, Isaac
Reinforcement learning (RL) has been demonstrated to have great potential in many applications of scientific discovery and design. Recent work includes, for example, the design of new structures and compositions of molecules for therapeutic drugs. Much of the existing work related to the application of RL to scientific domains, however, assumes that the available state representation obeys the Markov property. For reasons associated with time, cost, sensor accuracy, and gaps in scientific knowledge, many scientific design and discovery problems do not satisfy the Markov property. Thus, something other than a Markov decision process (MDP) should be used to plan / find the optimal policy. In this paper, we present a physics-inspired semi-Markov RL environment, namely the phase change environment. In addition, we evaluate the performance of value-based RL algorithms for both MDPs and partially observable MDPs (POMDPs) on the proposed environment. Our results demonstrate deep recurrent Q-networks (DRQN) significantly outperform deep Q-networks (DQN), and that DRQNs benefit from training with hindsight experience replay. Implications for the use of semi-Markovian RL and POMDPs for scientific laboratories are also discussed.
- North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
- North America > United States > District of Columbia > Washington (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.76)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)